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Abstract 

The recent genealogical history of human populations is a complex mosaic formed by individual migration, 
large-scale population movements, and other demographic events. Population genomics datasets can provide 
a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long 
segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES 
dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand 
years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of 
these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern 
Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 
1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off 
exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of 
Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is 
substantial regional variation in the number of shared genetic ancestors: especially high numbers of common 
ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much 
lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic 
effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared 
ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and 
large historical events. Population genomic datasets have considerable power to uncover recent demographic 
history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world. 



Author Summary 

Few of us know our family histories more than a few generations back. Therefore, it is easy to overlook 
the fact that we are all distant cousins, related to one another via a vast network of relationships. Here 
we use genome-wide data from European individuals to investigate these relationships over the past three 
thousand years, by looking for long stretches of shared genome between pairs of individuals inherited from 
common genetic ancestors. We quantify this ubiquitous recent common ancestry, showing that for instance 
even pairs of individuals on opposite sites of Europe share hundreds of genetic common ancestors over this 
time period. Despite this degree of commonality, there are also striking regional differences. For instance, 
southeastern Europeans share large numbers of common ancestors which date to the era of the Slavic and 
Hunnic expansions around 1,500 years ago, while most common ancestors that Italians share with other 
populations lived longer ago than 2.500 years. The study of long stretches of shared genetic material holds 
the promise of rich information about many aspects of recent population history. 
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1 Introduction 



Even seemingly unrelated humans are distant cousins to each other, as all members of a species are related 
to each other through a vastly ramified family tree (their pedigree). We can see traces of these relationships 
in genetic data when individuals inherit shared genetic material from a common ancestor. Traditionally, 
population genetics has studied the distant bulk of these genetic relationships, which in humans typically 
date from hundreds of thousands of years ago (e.g. Cann et al. 1987 Takahata 1993). Such studies have 



provided deep insights into the origins of modern humans (e.g. Li and Durbin 2011), and into recent 



admixture between diverged populations (e.g. Moorjani et al. 2011 Henn et al. 2012a) 



Although most such genetic relationships among individuals are very old, some individuals are related 
on far shorter time scales. Indeed, given that each individual has 2" ancestors from n generations ago, 
theoretical considerations suggest that all humans are related genealogically to each other over surprisingly 
short time scales (Chang 1999 Rohde et al. 2004). We are usually unaware of these close genealogical ties. 



as few of us have knowledge of family histories more than a few generations back, and these ancestors often 



do not contribute genetic material to us (Donnelly 19831. However, in large samples we can hope to identify 



genetic evidence of more recent relatedness, and so obtain insight into the population history of the past 
tens of generations. Here we investigate patterns of recent relatedness in a large European dataset. 

The past several thousand years are replete with events that may have had significant impact on modern 
European relatedness, from the Neolithic expansion of farming to the Roman empire and the much more 
recent expansions of the Slavs and the Vikings. Our current understanding of these events is deduced 
from archaeological, linguistic, cultural, historical, and genetic evidence, with widely varying degrees of 
certainty. However, the demographic and genealogical impact of these events is still uncertain, with some 
theories holding that group identity was fluid to the point that such groups would have maintained very 
little coherence in terms of ancestry ( Gillett[ 2006). Genetic data describing the breadth of genealogical 
relationships, can therefore add another dimension to our understanding of these historical events. 

Work from uniparentally inherited markers (mtDNA and Y chromosomes) has improved our understand- 
ing of human demographic history (e.g. Soares et al.[ 2010). However, interpretation of these markers is 
difficult since they only record a single lineage of each individual (the maternal and paternal lineages, respec- 
tively), rather than the entire distribution of ancestors. Genome- wide genotyping and sequencing datasets 
have the potential to provide a much richer picture of human history, as we can learn simultaneously about 
the diversity of ancestors that contributed to each individual's genome. 



A number of studies have begun to reveal quantitative insights into recent human history ( Novembre and 



Ramachandran 2011). Within Europe, the first two principal axes of variation of the matrix of genotypes 



are closely related to a rotation of latitude and longitude (Menozzi et al. 1978 Novembre et al. 2008 Lao 



et al. 


2008 




and Stephens 



as would be expected if patterns of ancestry are mostly shaped by local migration ( Novembre 



Other work has revealed a slight decrease in diversity running from south-to-north in 



Nelson et al. 2012) , and the lowest haplotype diversity in England and Ireland (O'Dushlaine et al. 20101 



However, we currently have little sense of the time scale of the historical events underlying these geographic 
patterns, nor the degrees of genealogical relatedness they imply. 

In this paper, we analyze those rare long chunks of genome that are shared between pairs of individuals 
due to inheritance from recent common ancestors, to obtain a detailed view of the geographic structure of 
recent relatedness. To determine the time scale of these relationships, we develop methodology that uses the 
lengths of shared genomic segments to infer the distribution of the ages of these recent common ancestors. 
We find that even geographically distant Europeans share ubiquitous common ancestry even within the past 
1,000 years, and show that common ancestry from the past 3,000 years is a result of both local migration 
and large-scale historical events. 



1.1 Definitions: Genetic ancestry and identity by descent 

Genetic data can only hope to inform us about those ancestors from whom a pair of individuals have 
both inherited a common genomic region, in which case the ancestor is a "genetic common ancestor" , 
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Figure 1: (A) A hypothetical portion of the pedigree relating two sampled individuals, which shows six 
of their shared genealogical common ancestors, with the portions of ancestral chromosomes from which the 
sampled individuals have inherited shaded grey. The IBD blocks they have inherited from the two genetic 
common ancestors are colored red, and the blue arrow denotes the path through the pedigree along which 
one of these IBD blocks was inherited. (B) Cartoon of the spatial locations of ancestors of two individuals 
- circle size is proportional to likelihood of genetic contribution, and shared ancestors are marked in grey. 
Note that common ancestors are likely located between the two, and their distribution becomes more diffuse 
further back in time. 



and the region is shared "identical by descent" (IBD) by the two. We define an "IBD block" to be a 
contiguous segment of genome inherited (on at least one chromosome) from a shared common ancestor 
without intervening recombination (see figure [T]A_) . Historically, a more usual definition of IBD restricts to 
those segments inherited from some prespecified set of "founder" individuals (e.g. Fisher 1954 Donnelly 



1983 Chapman and Thompson 2002). We differ because we allow ancestors to be arbitrarily far back in 



time. Under our definition, everyone is IBD everywhere, but mostly on very short, old segments (Powell 



et al. 2010). We measure lengths of IBD segments in units of Morgans (M) or centiMorgans (cM), where 



1 Morgan is defined to be the distance over which an average of one recombination (i.e. a crossover) occurs 
per meiosis. Segments of IBD are broken up over time by recombination, which implies that older shared 
ancestry tends to result in shorter shared IBD blocks. 

Sufficiently long segments of IBD can be identified as long, contiguous regions over which the two individ- 
uals are identical (or nearly identical) at a set of Single Nucleotide Polymorphisms (SNPs) which segregate 
in the population. Formal, model-based methods to infer IBD are only computationally feasible for very 



recent ancestry (e.g. Brown et al. 2012), but recently, fast heuristic algorithms have been developed that can 



be applied to thousands of samples typed on genotyping chips (e.g. Browning and Browning 2011 Gusev 



et al. 2009). 



The relationship between numbers of long, shared segments of genome, numbers of genetic common 
ancestors, and numbers of genealogical common ancestors can be difficult to envision. Since everyone has 
exactly two biological parents, every individual has exactly 2" paths of length n leading back through their 
pedigree, each such path ending in a grand"" ^parent. However, due to Mendelian segregation and limited 



recombination, genetic material will only be passed down along a small subset of these paths (Donnelly 
[1983^. As n grows, these paths proliferate rapidly and so the genealogical paths of two individuals soon 
overlap significantly. (These points are illustrated in in figure [l]) By observing the number of shared 
genomic blocks, we learn about the number of paths through the pedigree along which both individuals have 
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inherited genetic material. 

At least one parent of each genetic common ancestor of two individuals is also a genetic common ancestor, 
but the relevant quantity is the time back to the most recent such common genetic ancestor. For this reason, 
when we say "genetic common ancestor" or "rate of genetic common ancestry" , we are referring to only the 
most recent genetic common ancestors from which the individuals in question inherited their shared segments 
of genome. This quantity can also be more intuitive - for instance, a randomly mating population of constant 
size will have a constant rate of appearance of most recent genetic common ancestors (the coalescent rate, 



Hudson 1990 1 The total number of unqualified genetic common ancestors is lower bounded by the total 



number of most recent genetic common ancestors that have so far appeared; and since we deal with fairly 
small numbers of generations and fairly short pieces of genome, the true number of genetic ancestors only 
differs by a small factor. 



2 Results 



We applied the f astlBD method (with some modifications), implemented in BEAGLE (Browning and Brown- 
ing 2011 1, to the POPRES dataset (Nelson et al. 2008), which includes language and country-of-origin data 
for several thousand Europeans genotyped at 500,000 SNPs. Our simulations showed that we have good 
power to detect long IBD blocks (probability of detection 50% for blocks longer than 2cM, rising to 98% 
for blocks longer than 4cM), and a low false positive rate (see figure [6] below). We restricted our analysis 
to individuals who reported all grandparents from the same European country (so this is who we refer to 
as "Europeans"). After removing obvious outlier individuals and close relatives, we were left with 2,257 
individuals which we grouped using reported country of origin and language into 40 populations, listed with 
sample sizes and average IBD levels in table [l] For geographic analyses, we located each population at the 
largest population city in the appropriate region. Pairs of individuals in this dataset were found to share a 
total of 1.9 million segments of IBD, an average of 0.74 per pair of individuals, or 831 per individual. The 
mean length of these blocks was 2.5cM, the median was 2.1cM and the 25*^ and 75**^ quantiles are 1.5cM and 
2.9cM respectively. The majority of pairs sharing IBD shared only a single block of IBD (94%). Between 
30% and 250% of The genome of each individual is covered by blocks of IBD. 



et al. 



The observed genomic density of long IBD blocks (per cM) can be affected by recent selection ( Albrechtsen 
2010) and by recombination modifiers. We find that the local density of IBD blocks of all lengths 
is relatively constant across the genome, but in certain regions the length distribution is systematically 



perturbed (see supplemental figure SI), including around certain centromeres and the large inversion on 
chromosome 8 (Giglio et al. 2001), also seen by Albrechtsen et al. (2010). Somewhat surprisingly, the 



MHC does not show an unusual pattern of IBD, despite having shown up in other genomic scans for IBD 
(Albrechtsen et al. 2010 Gusev et al. 2012). However, there are a few other regions where differences in 



IBD rate are not predicted by differences in SNP density. Notably, there are two regions, on chromosomes 
15 and 16, which are nearly as extreme in their deviations in IBD as the inversion on chromosome 8, and 
may also correspond to large inversions segregating in the sample. These only make up a small portion of 
the genome, and do not significantly affect our other analyses; we leave further analysis for future work. 



2.1 Substructure and recent migrants 

We should expect significant within-population variability, as modern countries are relatively recent con- 
structions of diverse assemblages of languages and heritages. To assess the uniformity of ancestry within 
populations, we used a permutation test to measure, for each pair of populations x and the uniformity 
with which relationships with x are distributed across individuals from y. Most comparisons show statisti- 
cally significant substructure (supplemental figure S2). A notable exception is that nearly all populations 
showed no significant partitioning of numbers of common ancestors with Italian samples, suggesting that 
most common ancestors shared with Italy lived longer ago than the time that structure within modern-day 
countries formed. 
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E group 




n 


self 


other 


N group 




n 


self 


other 


Albania 


AL 


9 


14.5 


1.7 


Denmark 


DK 


1 


— 


0.9 


Austria 


AT 


14 


1.3 


0.9 


Finland 


FI 


1 


— 


1.2 


Bosnia 


BO 


9 


4.1 


1.6 


Latvia 


LV 


1 


- 


1.6 


Bulgaria 


BG 


1 




1.3 


Norway 


NO 


2 


2.0 


0.8 


Croatia 


HR 


9 


2.8 


1.6 


Sweden 


SE 


10 


3.4 


1.0 


Czech Republic 


CZ 


9 


2.1 


1.3 












Greece 


EL 


5 


1.8 


0.9 


W group 




n 


self 


other 


Hungary 


HU 


19 


1.9 


1.2 


Belgium 


BE 


37 


1.1 


0.6 


Kosovo 


KO 


15 


9.9 


1.7 


England 


EN 


22 


1.3 


0.7 


Montenegro 


ME 


1 




1.8 


France 


FR 


86 


0.7 


0.5 


Macedonia 


MA 


4 


2.5 


1.4 


Germany 


DE 


71 


1.1 


0.9 


Poland 


PL 


22 


3.8 


1.5 


Ireland 


IE 


60 


2.6 


0.6 


Romania 


RO 


14 


2.1 


1.2 


Netherlands 


NL 


17 


1.9 


0.7 


Russia 


RU 


6 


4.3 


1.4 


Scotland 


SC 


5 


2.2 


0.7 


Slovenia 


SI 


2 


5.0 


1.3 


Swiss French 


CHf 


839 


1.3 


0.6 


Serbia 


RS 


11 


2.7 


1.5 


Swiss German 


CHd 


103 


1.6 


0.6 


Slovakia 


SK 


1 




0.7 


Switzerland 


CH 


17 


1.1 


0.5 


Ukraine 


UA 


1 




1.5 


United Kingdom 


UK 


358 


1.2 


0.7 


Yugoslavia 


YU 


10 


3.4 


1.5 






















I group 




n 


self 


other 


TC group 




n 


self 


other 


Italy 


IT 


213 


0.6 


0.5 


Cyprus 


CY 


3 


2.7 


0.4 


Portugal 


PT 


115 


1.9 


0.5 


Turkey 


TR 


4 


2.2 


0.5 


Spain 


ES 


130 


1.5 


0.4 



Table 1: Populations, abbreviations, sample sizes (n), mean number of IBD blocks shared by a pair of 
individuals from that population ("self"), and mean IBD rate averaged across all other populations ("other"); 
sorted by regional groupings described in the text. Populations with only a single sample do not have a 
"self" IBD rate. 



5 



CHf blocks in IT 



I — I — I — I — I 



100 



300 



# blocks 
UK blocks in IT 




I 1 1 1 

50 100 150 



S o 
« in 




T 1 1 1 1 r 

200 300 400 500 600 700 



# blocks 



# blocks with CHf 



IE blocks in UK 




I — I — I — I — I — I — I 

100 200 300 

# blocks 
DE blocks in UK 



—\ 1 1 

50 100 150 

# blocks 




"T" 



50 100 150 
# blocks with Ireland 



— r 

200 



Figure 2: Substructure, in (A) Italian, and (B) UK samples. The leftmost plots of (A) show histograms of 
the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone 
from the UK (bottom) , overlaid with the expected distribution (Poisson) if there was no substructure. Next 
is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and UK samples, for all 
samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each 
Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that 
these two are positively correlated, ranging continuously between values typical for Turkey /Cyprus and for 
France. Figure (B) is similar, showing that the substructure within the UK is part of a continuous trend 
ranging from Germany to Ireland. The outliers visible in the scatterplot of figure [2j3 are easily explained as 
individuals with immigrant recent ancestors - the UK individuals in the lower left have many more Italian 
blocks than all other UK samples, and the individual labeled "SK" is a clear outlier for the number of blocks 
shared with the Slovakian sample. 
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Two of the more striking examples of substructure are illustrated in figure [2j Here, we see that variation 
within countries can be reflective of continuous variation in ancestry that spans a broader geographic region, 
crossing geographic, political and linguistic boundaries. Figure [2]A_ shows the distinctly bimodal distribu- 
tion of numbers of IBD blocks that each Italian shares with both French-speaking Swiss and the UK, and 
that these numbers are strongly correlated. Furthermore, the amount that Italians share with these two 
populations varies continuously from values typical for Turkey and Cyprus, to values typical for France and 
Switzerland. Interestingly, the Greek samples (EL) place near the middle of the Italian gradient. It is natural 
to guess that there is a north-south gradient of recency of common ancestry along the length of Italy, and 
that southern Italy has been historically more closely connected to the eastern Mediterranean. 

In contrast, within samples from the UK and nearby regions we see negative correlation between numbers 
of blocks shared with Irish and numbers of blocks shared with Germans. From our data, we do not know 
if this substructure is also geographically arranged within the UK. However, an obvious explanation of 
this pattern is that individuals within the UK differ in the extent of their recent Irish ancestry, and that 
individuals with less Irish ancestry have a larger portion of their recent ancestry shared with Germans. This 
suggests that there is variation across the UK - perhaps a geographic gradient - in terms of the amount of 
Celtic versus Germanic ancestry. 



2.2 Europe-wide patterns of relatedness 

Individuals usually share the highest number of IBD blocks with others from the same population, but with 
some exceptions. For example, individuals in the UK share more IBD blocks on average, and hence more 
close genetic ancestors, with individuals from Ireland than with other individuals from the UK, and Germans 
share similarly more with Polish than with other Germans. In figure |3] we depict the geogrphy of rates of 
IBD sharing between populations, i.e. the average number of IBD blocks shared by a randomly chosen pair 
of individuals. Above, maps show the IBD rate relative to certain chosen populations (maps, above), and 
below, all pairwise sharing rates are plotted against the geographic distance separating the populations. It 
is evident that geographic proximity is a major determinant of IBD sharing (and hence recent relatedness), 
with the rate of pairwise IBD decreasing relatively smoothly as the geographic separation of the pair of 
populations increases. 

Superimposed on this geographic decay there is striking regional variation in rates of IBD. To further 
explore this variation, we divided the populations into the four groups listed in table [l] using geographic 
location and correlations in the pattern of IBD sharing with other populations (shown in supplemental 



figure S4). These groupings are defined as: Europe "E", lying to the east of Germany and Austria; Europe 
"N", lying to the north of Germany and Poland; Europe "W", to the west of Germany and Austria and 
including these; the Iberian and Italian peninsulas "I"; and Turkey/Cyprus "TC". Although the general 
pattern of regional IBD variation is strong, none of these groups have sharp boundaries - for instance, 
Germany, Austria, and Slovakia are intermediate between E and W. Furthermore, we suspect that the 
Italian and Iberian peninsulas likely do not group together because of shared ancestry, but rather because 
of similarly low rates of IBD with other European populations. The overall mean IBD rates between these 
regions are shown in table[2] and comparisons between different groupings are colored differently in figure[3p- 
I, showing that rates of IBD sharing between E populations and between N populations average a factor of 
about three higher than other comparisons at similar distances. 

To better understand IBD within these groupings, we show in figures [Sp-I how average numbers of 
IBD blocks shared, in three different length categories, depend on the geographic distance separating the 
two populations. Even without taking into account regional variation, mean numbers of shared IBD blocks 
decay roughly exponentially with distance, and further structure is revealed by breaking out populations 
by regional groupings described above. The exponential decays shown for each pair of groupings emphasize 
how the decay of IBD with distance becomes more rapid for longer blocks. This is expected under models 
where migration is mostly local, since as one looks further back in time, the distribution of each individual's 
ancestors is less concentrated around the individual's location (recall figure [lj3). Therefore, the expected 
number of ancestors shared by a pair of individuals decreases as the geographic distance between the pair 
increases; and rate of this decrease is larger for more recent ancestry. 
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This can also explain why the decay of IBD with distance varies significantly by region. For instance, 
the gradual decay of sharing with the Iberian and Italian peninsulas could occur because these blocks are 
inherited from much longer ago than blocks of similar lengths shared by individuals in other populations. 

Conversely, the decay with distance is also quite gradual for "E-E" relationships. This is especially true 
for our shortest (oldest) blocks, where we see almost no decay with distance: individuals in our E grouping 
share on average almost as many short blocks with individuals in distant E populations as they do with 
others their own population. We argue below that this is because modern individuals in these locations have 
a larger proportion of their ancestors in a relatively small population that subsequently expanded. 



IBD rate 
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TC 
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2.57 


0.44 


0.99 
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0.44 


0.80 


0.43 


0.41 


0.45 
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0.99 


0.43 
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0.33 
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TC 


0.62 


0.41 


0.33 


1.43 


0.25 


W 


0.53 


0.45 


0.86 


0.25 


0.93 



Table 2: Rates of IBD within and between each geographic grouping given in table [T] 



2.3 Timing and numbers of common ancestors 

Each block of genome shared IBD represents genetic material inherited from a single genetic common ancestor 
shared by a pair of individuals - at least, the vast majority do, as these blocks come from long enough ago 
that it is highly unlikely for more than one block to be inherited along precisely the same path through the 
pedigree. Since the distribution of lengths of IBD blocks differs depending on the age of the ancestors - 
e.g. older blocks tend to be shorter - it is possible to use the distribution of lengths of IBD blocks to infer 
numbers of most recent genetic common ancestors back through time. This method is conceptually similar 



to the work of Pool and Nielsen (2009) and Gravel (2012), who used the length distribution of admixture 



tracts to infer parameters in demographic models. 



Nature of the results on age inference There are two major difficulties to overcome, however. First, 
detection is noisy: we do not detect all IBD segments (especially shorter ones), and some of our IBD segments 
are false positives. This problem can be overcome by careful estimation and modeling of error, described in 
section [4. 3| The second problem is more serious and unavoidable: as described in section [4. 7[ this inference 



problem is extremely "ill conditioned" (in the sense of Petrov and Sizikov 2005), meaning in this case that 



there are many possible histories of shared ancestry that fit the data nearly equally well. For this reason, 
our results necessarily have a high degree of uncertainty, but still provide a good deal of useful information. 

We deal with this uncertainty by describing the set of histories (i.e. historical numbers of common genetic 
ancestors) that are consistent with the data, summarized in two ways. First, it is useful to look at individual 
consistent histories, which gives a sense of recurrent patterns and possible historical signals. Figure |4] shows 
for several populations both the best-fitting history (in black) and the smoothest history that still fits the 
data (in red). We can make general statements if they hold across all (or most) consistent histories. Second, 
we can summarize the entire set of consistent histories by finding confidence intervals (bounds) for the 
total number of common ancestors aggregated in certain time periods. These are shown in Figure [5j giving 
estimates (colored bands) and bounds (vertical lines) for the total numbers of genetic common ancestors 
in each of three time periods, roughly 0-500ya, 500-1500ya, and 1500-2500ya ("ya" denotes "years ago"). 
Supplemental figure [S5] is a version of figure [5] with more populations, and plots analogous to figure |4] for 
all these histories are shown in supplemental figure |S7[ For a precise description of the problem and our 
methods, see section |47] 

The time periods we use for these bounds are quite large, but this is unavoidable, because of a trade 
off between temporal resolution and uncertainty in numbers of common ancestors. Also note that the lower 
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Figure 3: (A— F) The area of the circle area located on a particular population is proportional to the 
mean number of IBD blocks of length at least IcM shared between random individuals chosen from that 
population and the population named in the label (also marked with a star). Both regional variation of 
overall IBD rates and gradual geographic decay are apparent. (G— I) Mean number of IBD blocks of lengths 
l-3cM, 3-5cM, and >5cM, respectively, shared by a pair of individuals across all pairs of populations; the 
area of the point is proportional to sample size (mmiber of distinct pairs), capped at a reasonable value. 
Colors give categories based on the regional groupings of table [ij and lines show an exponential decay fit to 
each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used 
in the fit but not shown in the figure (due to the log scale). "E-E", "N-N", and "W-W" denote any two 
populations both in the E, N, or W grouping, respectively; "TC-any" denotes any population paired with 
Turkey or Cyprus; "I-(E,N,W)" denotes Italy paired with any population except Turkey or Cyprus; and 
"between E,N,W" denotes the remaining pairs (when both populations are in E, N, or W, but the two are 
in different groups). The exponential fit for the N-N points is not shown due to the very small sample size. 
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Figure 4: Estimated average number of genetic common ancestors per generation back through time shared 
by (A) pairs of individuals from "the Balkans" (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, 
Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers); and, shared by one individual 
from the Balkans with one individual from (B) Albanian speaking populations; (C) Italy; or (D) France. 
The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the 



data, as described in section 4.7 (E) shows the observed IBD length distribution for pairs of individuals 
from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in the 
first figure, partitioned by time period in which the common ancestor lived. The second column of figures is 
similar, except that comparisons are relative to samples from the UK. 
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bounds on numbers of common ancestors during each time interval are often effectively zero, because one 
can (roughly speaking) obtain a history with equally good fit by moving ancestors from that time interval 
into the neighboring ones, resulting in peaks on either side of the selected time interval. Thus, these lower 
bounds are perhaps unrealistic, and the reader should bear in mind that the dependence between intervals 
in our uncertainty is not depicted. 
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Figure 5: Estimated total numbers of genetic common ancestors shared by various pairs of populations, 
in roughly the time periods 0-500ya, 500-1500ya, 1500-2500ya, and 2500~4300ya. We have combined some 
populations to obtain larger sample sizes: "S-C" denotes Serbo-Croatian speakers in former Yugoslavia, 
"PL" denotes Poland, "R-B" denotes Romania and Bulgaria, "DE" denotes Germany, "Bal" denotes Latvia, 
Finland, Sweden, Norway, and Denmark, "UK" denotes the United Kingdom, and "IT" denotes Italy. For 
instance, the green bars in the leftmost panels tell us that Serbo-Croatian speakers and Germans share 0-1 
most recent genetic common ancestor from the last 500 years, 10-30 from the period 500-1500 years ago, 
and around 500 from each of the two previous thousand years. Although the lower bounds appear to extend 
to zero, they are significantly above zero in nearly all cases except for the most recent period 0-540ya. 



Results of age inference In figure |4] we show how the age and amount of shared genetic ancestry changes 
as we move away from the Balkans (left column) and the UK (right column), along with two examples of 
how the observed block length distribution is composed of ancestry from different depths. More plots of this 
form are shown in supplemental figure |S7[ 

Most detectable recent common ancestors lived between 1500 and 2500 years ago. Furthermore, only a 
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small proportion of blocks longer than 2cM are inherited from longer ago than 4000 years. Obviously, there 
are a vast nmiiber of genetic common ancestors older than this, but the blocks inherited from such common 
ancestors are sufficiently unlikely to be longer than 2cM that we do not detect many. For the most part, 
blocks longer than 4cM come from 500-1500 years ago, and blocks longer than lOcM from the last 500 years. 

In most cases, only pairs within the same population are likely to share genetic common ancestors within 
the last 500 years. Exceptions are generally neighboring populations that have had high population growth 
and/or asymmetric migration rates recently (e.g. UK and Ireland). During the period 500-1500ya, individuals 
typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, 
although some distant populations have very low rates. Longer ago than 1500ya, pairs of individuals from 
any part of Europe share hundreds of genetic ancestors in common, and some share significantly more. 



Regional variation: interesting cases We now examine some of the more striking patterns we see in 
more detail. 

There is relatively little common ancestry shared between the Italian peninsula and other locations, 
and what there is seems to derive from longer ago than 2500ya. This relatively old date is consistent 
with the weak relationship of geographic distance and rate of IBD sharing of figure |3j as discussed above. 
An exception is that Italy and the neighboring Balkan populations share small but significant numbers of 
common ancestors in the last 1500 years, as seen in supplemental figures [S7| or |S8| The rate of genetic common 
ancestry between pairs of Italian individuals seems to have been fairly constant for the past 2500 years, which 
combined with significant structure within Italy suggests a constant exchange of migrants between coherent 
subpopulations. Also recall that most populations show no substructure with regards the number of blocks 
shared with Italians, implying that the common ancestors shared with Italy predate divisions within these 
populations. 

Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common 
ancestors with other populations over the last 2500 years. However, the rate of IBD sharing within the 
peninsula is much higher than within Italy - the Iberian peninsula shares fewer than 5 genetic common 
ancestors with other populations during the last 1500 years, compared to 78 per pair within the peninsula, 
and past 1500ya Iberian individuals share only slightly more genetic common ancestors with each other as 
they do with people from most of the rest of Europe. 

The higher rates of IBD between populations in the "E" grouping shown in figure [3] seem to derive mostly 
from ancestors living 1500-2500ya, but also show increased numbers from 500-1500ya, as shown in figure [5] 
and supplemental figures [S8j Across all populations, even geographically distant individuals share about as 
many common ancestors as do two Irish or two French-speaking Swiss. 

By far the highest rates of IBD within any populations is found between Albanian speakers - around 
200 ancestors from 0-500ya, and around 1800 ancestors from 500-1500ya (so high that we left them out 
of figure [5] see supplemental figure S5). Beyond 1500ya, the rates of IBD drop to levels typical for other 
populations in the eastern grouping. 

There are clear differences in the number and timing of genetic common ancestors shared by individuals 
from different parts of Europe, These differences reflect the impact of major historical and demographic 
events, superimposed against a background of local migration and generally high genealogical relatedness 
across Europe. We now turn to discuss possible causes and implications of these results. 



3 Discussion 

Genetic common ancestry within the last 2,500 years across Europe has been shaped by diverse demographic 
and historical events. There are both continental trends, such as a decrease of shared ancestry with distance, 
regional patterns, such as higher IBD in eastern and northern populations, and diverse outlying signals. We 
have furthermore quantified numbers of genetic common ancestors that populations share with each other 
back through time, albeit with a large degree of (unavoidable) uncertainty. These numbers are intriguing 
not only because of the differences between populations, which reflect historical events, but the high degree 
of implied commonality between even geographically distant populations. 
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Ubiquity of common ancestry We have shown that typical pairs of individuals drawn from across 
Europe have a good chance of sharing long stretches of identity by descent, even when they are separated by 
thousands of kilometers. This implies that pairs of individuals across Europe likely share common genetic 
ancestors within the last 1,000 years, and are certain to share many within the last 2,500 years. The average 
number of genetic common ancestors from the last 1,000 years shared by individuals living at least 2,000km 
apart is about .125 (and at least .05); between 1, 000-2, OOOya they share about four; and between 2,000- 
3,000ya they share above 50. Since the chance is small that any genetic material has been transmitted along 



a particular genealogical path from ancestor to descendent more than 8 generations deep (Donnelly 1983) 
- about .008 at 240ya, and 2.5 x 10~^ at 480ya - this implies the sharing of, conservatively, thousands of 
genealogical ancestors in only the last 1,000 years between pairs of individuals even when they are separated 
by large geographic distances. At first sight this result seems counterintuitive. However, as 1000 years 
is about 33 generations, and 2^'^ « 10^" is far larger than the size of the European population, so long 
as populations have mixed sufficiently, by 1,000 years ago everyone (who left descendants) would be an 
ancestor of every present day European. Our results are therefore one of the first genomic demonstrations 
of the counter-intuitive but necessary fact that all Europeans are genealogically related over very short time 
periods, and lends substantial support to models predicting close and ubiquitous common ancestry of all 



modern humans (Rohde et al. 2004) 



The fact that most people alive today in Europe shares nearly the same set of (European, and possibly 
world-wide) ancestors from only 1,000 years ago seems to contradict the signals of long term, albeit subtle, 
population genetic structure within Europe (e.g. Novembre et al. 2008 Lao et ahj 2008). These two facts 



can be reconciled by the fact that even though the distribution of ancestors (as cartooned in Figure [Tj3) has 
spread to cover the continent, there remain differences in degree of relatedness of modern individuals to these 
ancestral individuals. For example, someone in Spain may be related to an ancestor in the Iberian peninsula 
through perhaps 1000 different routes back through the pedigree, but to an ancestor in the Baltic region 
by only 10 different routes, so that the probability that this Spanish individual inherited genetic material 
from the Iberian ancestor is 100 times higher. This allows the amount of genetic material shared by pairs of 
extant individuals to vary even if the set of ancestors is constant. 



Limitations of Sampling A concern about our results is that the European individuals in the POPRES 
dataset were all sampled in either Lausanne or London. This might bias our results, for instance, because 
immigrant communities may originate mostly from a particular small portion of their home population, 
thereby sharing a particularly high number of common ancestors with each other. We see remarkably little 
evidence that this is the case: there is a high degree of consistency in numbers of IBD blocks shared across 
samples from each population, and between neighboring populations. For instance, the high degree of shared 
common ancestry among Albanian speakers might be because most of these originated from a small village 
rather than uniformly across Albania and Kosovo. However, this would not explain the high rate of IBD 
between Albanian speakers and neighboring populations. Even populations from which we only have one 
or two samples, which we at first assumed would be unusably noisy, provide generally reliable, consistent 
patterns, as evidenced by e.g. supplemental figure |S3| 

Conversely, it might be a concern that individuals sampled in Lausanne or London are more likely to have 
recent ancestors more widely dispersed than is typical for their population of origin. This is a possibility 
we cannot discard, and if true, would mean there is more structure within Europe than what we detect. 
However, again by the incredibly rapid spread of ancestry, this is unlikely to have an effect over more than a 
few generations and so does not pose a serious concern about our results. Fine-scale geographic sampling of 
Europe as a whole is needed to address these issues, and these efforts are underway in certain populations 



(e.g. Price et al. 2009 Jakkula et al. 2008 Tyler-Smith and Xue 2012 Winney et al. 20111 



Finally, we have necessarily have taken a narrow view of European ancestry as we have restricted our 
sample to individuals who are not outliers with respect to genetic ancestry, and when possible to those 
having all four grandparents drawn from the same county. Clearly the ancestry of Europeans is far more 
diverse than those represented here, but such steps seemed necessary to make best use of this dataset. 
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Ages of particular common ancestors We have shown that the problem of inferring the average distri- 
bution of genetic common ancestors back through time has a large degree of fundamental uncertainty. The 
data effectively leave a large number of degrees of freedom unspecified, so one must either describe the set 
of possible histories, as we do, and/or use prior information to restrict these degrees of freedom. 

A related but far more intractable problem is to make a good guess of how long ago a certain shared 
genetic common ancestor lived, as personal genome services would like to do, for instance: if you and I share 
a lOcM block of genome IBD, when did our most recent common ancestor likely live? Since the mean length 
of an IBD block inherited from 5 generations ago is lOcM, we might expect the average age of the ancestor 
of a lOcM block to be from around 5 generations. However, using our results, the typical age of a lOcM 
block shared by two individuals from the UK is between 32 and 52 generations. This discrepancy results 
from the fact that you are a priori much more likely to share a common genetic ancestor further in the past, 
and this acts to skew our answers away from the naive expectation. This also means that estimated ages 
must depend drastically on the populations' shared histories: the age of such a block shared by someone 
from the UK with someone from Italy is older, usually from 56 to 60 generations ago. This may not apply 
to ancestors from the past very few (perhaps less than eight) generations, from whom we expect to inherit 
multiple long blocks - in this case we can hope to infer a specific genealogical relationship with reasonable 



certainty (e.g. Huff et al. 2011 Henn et al. 2012b), although even then care must be taken to exclude the 



possibility that these multiple blocks have not been inherited from distinct common ancestors. 

Although the sharing of a long genomic segment can be an intriguing sign of some recent shared ancestry, 
the ubiquity of shared genealogical ancestry only tens of generations ago across Europe (and likely the world. 



Rohde et al. 2004 ) makes such sharing unsurprising, and assignment to particular genealogical relationships 



impossible. What is informative about these chance sharing events from distant ancestors is that they 
provide a fine-scale view of an individual's distribution of ancestors (e.g. figure [s]), and that in aggregate 
they can provide an unprecedented view into even small-scale human demographic history. 



3.1 The signal of history 

As we have shown, patterns of IBD provide ample but noisy geographic and temporal signals, which can 
then be connected to historical events. Rigorously making such connections is difficult, due to the complex 
recent history of Europe, controversy about the demographic significance of many events, and uncertainties 
in inferring the ages of common ancestors. Nonetheless, our results can be plausibly connected to several 
historical and demographic events. 



The migration period — Huns, Goths, and Slavs One of the striking patterns we see is the relatively 
high level of sharing of IBD between pairs of individuals across eastern Europe, as high or higher than that 
observed within other, much smaller populations. Furthermore, the numbers of short (older) IBD blocks 
shared between different populations is constant regardless of the geographic distance separating the two, 
as shown in figure [Sj This is consistent with these individuals having a comparatively large proportion of 
ancestry drawn from a relatively small population that expanded over a large geographic area, ancestry 
which we date to 1,000-2,000 years ago (see figures |4j [5] and S8). For example, even individuals from 
widely separated eastern populations share about the same amount of IBD as do two Irish individuals (see 
supplemental figure S3), suggesting that this ancestral population may have been relatively small. 



This evidence is consistent with the idea that these populations derive a substantial proportion of their 
ancestry from various groups that expanded during the "migration period" from the fourth through ninth 



centuries (Davies 2010). This period begins with the Huns moving into eastern Europe towards the end of 



the fourth century, establishing an empire including modern-day Hungary and Romania; and continues in 
the fifth century as various Germanic groups moved into and ruled much of the western Roman empire. The 
Slavic populations expanded beginning in the sixth century, probably from somewhere in the area between 



the Baltic, Black, and Adriatic seas (Barford 2001). By the seventh century, Slavic groups had spread into 



the Balkan Peninsula, and occupied much of the northern Balkans by the 10th century. The inclusion of 
(non-Slavic speaking) Hungary and Romania in the group of eastern populations sharing high IBD could be 
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due to the Huns, or because some of the same group of people who elsewhere are known as Slavs adopted 
different local cultures in those regions. Greece and Albania are also part of this putative signal of expansion, 
which could be because the Slavs settled in part of these ares (with unknown demographic effect). Additional 
work and methods would be needed to ascertain the geographic source(s) of this expansion. 

Italy, Iberia, and Prance On the other hand, we find that France and the Italian and Iberian peninsulas 
have the lowest rates of genetic common ancestry in the last 1,500 years, other than Turkey and Cyprus, 
and are the regions of continental Europe thought to be least affected by the Slavic and Hunnic migrations. 
These regions were, however, moved into by Germanic tribes (e.g. the Goths, Ostrogoths, and Vandals), 
which suggests that perhaps the Germanic migrations/invasions of these regions entailed a smaller degree of 
population replacement, or at least a larger rate of population growth, than the Slavic and/or Hunnic. It 
has been argued that this is the case, perhaps because Slavs moved into relatively depopulated areas, while 



Gothic "migrations" may have been takeovers by small military groups of extant populations ( Halsall 2005 



Kobylinski 2005) 



In addition to the very few genetic common ancestors that Italians share both with each other and with 
other Europeans, we have seen significant modern substructure within Italy (i.e. figure[2]) that predates most 
of this common ancestry, and estimate that most of the common ancestry shared between Italy and other 



populations is older than about 2,300 years (i.e. supplemental figure S7). This suggests significant substruc- 
ture and large population sizes within Italy, strong enough that different groups within Italy, share as little 
recent common ancestry as other distinct, modern-day countries, substructure that was not homogenized 
during the migration period. These patterns could also reflect in part a history of settlement of Italy from 
various sources, including: settlement of Greeks in southern Italy, settlement of Illyrians in eastern Italy, 



and an influx of people from across the Roman empire, including gene flow from Africa (Auton et al. 2009 



Moorjani et al. 2011 1; but is unlikely to be entirely due to these effect. 

In contrast to Italy, the rate of sharing of IBD within the Iberian peninsula is similar to that within other 
populations in Europe. There is furthermore much less evidence of substructure within our Iberian samples 
than within the Italians, as shown in supplemental figure [S2} This suggests that the reduced rate of shared 
ancestry is due to geographic isolation (by distance and/or the Pyrenees) rather than stable substructure 
within the peninsula. While the African gene fiow into Iberia has likely reduced the amount of sharing with 



the rest of Europe, as it contributed only a few percent of the ancestry of modern genomes (Moorjani et al. 



2011), it cannot explain the size of the reduction that we see. Furthermore, most of the shared ancestry 
between the Iberian peninsula and other populations seems to have occurred more than 2,000 years ago. It 
seems therefore that the Germanic (and later Moorish) invasions did not lead to increased common ancestry 
in the same way as the expansions in eastern Europe, and that subsequent mixing was slowed by geography. 

Other historical signals There are many other possible signals in these data, here we focus on only a 
few. 

The highest levels of IBD sharing are found in the Albanian-speaking individuals (from Albania and 
Kosovo), an increase in common ancestry deriving from the last 1,500 years. This suggests that a reasonable 
proportion of the ancestors of modern-day Albanian speakers are drawn from a relatively small, cohesive 
population that has persisted for at least the last 1,500 years. These individuals share similar numbers of 
common ancestors with nearby populations as do individuals in other parts of Europe, implying that the 
Albanian speakers have not been a particularly isolated population so much as a small one. Furthermore, our 
Greek samples (and to a lesser degree, the Macedonians) share much higher numbers of common ancestors 
with Albanian speakers than with other neighbors, possibly due to smaller effects of the Slavic expansion 
in these populations. The Albanian language is a Indo-European language without other close relatives 



( Hamp 1966 ) that persisted through periods when neighboring languages were strongly influenced by Latin 



or Greek. The "origin" of modern-day Albanians is contentious; it is argued for instance that they are 



descended in large part from the Illyrians (Wilkes 19961 who populated the eastern side of the Adriatic sea 



and part of modern-day Salento (Italy) during Roman times. Our results are certainly consistent with this 
view, including the fact that Italians share more common ancestors with Albanian speakers than with other 
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populations (although these ancestors are estimated to be from the last 1,500 years), so this may reflect 
more recent migration. 

There are other regions having high genetic common ancestry that suggest intriguing conclusions, but 
must remain tentative due to low sample sizes. For example, our 13 samples from Scandinavian populations 
(Norway, Sweden, Denmark) share high rates of IBD with each other and with certain nearby populations. 
We estimate that these populations had higher rates of genetic common ancestry from the last 1500 years 
with the UK, Ireland, Belgium, the Netherlands, France, Germany, German-speaking Swiss, Finland, Russia, 



Latvia, Romania, and Ukraine (see supplemental figure S5 ) , which coincides quite well with known Viking 
settlements, suggesting that the Norse expansion may have had a significant demographic effect. However, 
we would need larger samples to be confident about these results. 



Future directions Our results show that patterns of recent identity by descent both provide evidence of 
ubiquitous shared common ancestry and hold the potential to shed considerable light on the complex history 
of Europe. However, these inferences also quickly run up against a fundamental limit to our ability to infer 
pairwise rates of recent common genetic ancestry. In order to make a fuller model of European history, we 
will need to make use of diverse sources of genomic information from large samples, including IBD segments 
and rare variants (Nelson et al. 



2012 Tennessen et al. 



2012), and develop methods that can more fully 
utilize this information. Another profound difficulty is that Europe - and indeed any large continental 
region - has such complex layers of history, through which ancestry has mixed so greatly, that attempts 
to connect genetic signals in extant individuals to particular historical events requires the corroboration of 
other sources of information. For example, the ability to isolate ancient autosomal DNA from individuals 



who lived during these time periods (as do Skoglund et al. 2012 Keller et al. , 2012 1 will help to overcome 
some of these these profound difficulties. More generally, the quickly falling cost of sequencing, along with 
the development of new methods, will shed light on the recent demographic and genealogical history of 
populations of recombining organisms, human and otherwise. 



4 Materials and Methods 



4.1 Description of data and data cleaning 

The portion of the POPRES dataset that we use was collected partly in Lausanne, Switzerland and partly 



in London, England; it is described in Nelson et al. (2008). Those collected in Lausanne reported parental 



and grandparental country of origin; those collected in London did not. We followed Novembre et al. (2008) 



in assigning each sample to the common grandparental country of origin when available, and discarding 
samples whose parents or grandparents originated from different countries. We took further steps to restrict 
to individuals whose grandparents came from the same geographic region, first performing principal com- 
ponents analysis on the data using SMARTPCA (Patterson et al. 2006), and excluding those individuals 



who clustered with populations outside Europe (the majority of such were already excluded by self-reported 
non-European grandparents). We then used PLINK's inference of the fraction of single-marker IBD (ZO, 
Zl, and Z2, [Purcell et al. [2007 ) to identify very close relatives, finding 25 pairs that are first cousins or 
closer (including duplicated samples), and excluded one individual from each pair. We grouped samples into 
populations mostly by reported country, but also used reported language in a few cases. Because of the 
large Swiss samples, we split this group into three by language: French-speaking (GHf), German-speaking 
(CHd), or other (CH). Many samples reported grandparents from Yugoslavia; when possible we assigned 
these to a modern-day country by language, and when this was ambiguous or missing we assigned these to 
"Yugoslavia" . Most samples from the United Kingdom reported this as their country of origin; however, the 
few that reported "England" or "Scotland" were assigned this label. This left us with 2,257 individuals from 
40 populations; for sample sizes see table [TJ Supplemental table SI further breaks this down, and unam- 
biguously gives the composition of each population. Physical distances were converted to genetic distances 
using the hg36 map, and the average human generation time was taken to be 30 years (Fenner, 2005). 

Code implementing all methods described below are available at http : / /www . github . coin/petrelharp[ 
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4.2 Calling IBD blocks 



To find blocks of IBD, we used tlie method fast IBD implemented in BEAGLE (Browning and Browning 



2011). As suggested by the authors, in all cases we ran the algorithm 10 times with different random 



seeds, and postprocessed the results to obtain IBD blocks. Based on our power simulations described below, 
we modified the postprocessing procedure recommended by [Browning and Browning (2011) to deal with 
spurious gaps introduced into long blocks of IBD. We called IBD segments by first removing any segments 
not overlapping a segment seen at least one other run (with no score cutoff); then merging any two segments 
separated by a gap shorter than at least one of the segments and no more than 5cM long; and finally 
discarding any merged segments that did not contain a subsegment with score below 10~^. As shown in 
figure [6j this resulted in a false positive rate of between 8-15% across length categories, and a power of at 
least 70% above IcM, reaching 95% by 4cM. After post-processing, we were left with 1.9 million IBD blocks, 
1 million of which were at least 2cM long (at which length we estimate 85% power and a 10% false positive 
rate). 




Figure 6: (A) Bias in inferred length with lines x = y (dotted) and a loess fit (solid). Each point 
is a segment of true IBD (copied between individuals), showing its true length and inferred length after 
postprocessing. Color shows the number of distinct, nonoverlapping segments found by BEAGLE, and the 
length of the vertical line gives the total length of gaps between such segments that BEAGLE falsely inferred 
was not IBD. (B) Estimated false positive rate as a function of length. Observed rates of IBD blocks, per 
pair and per cM, are also displayed for the purpose of comparison. "Distant" and "Nearby" means IBD 
between pairs of populations closer and farther away than 1000km, respectively. (C) Below, the estimated 
power as a function of length, together with the parametric fit of equation ([5]). 
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4.3 Power and false positive simulations 

All methods to identify haplotypic IBD rely on identifying long regions of near identical haplotypes between 
pairs of individuals (referred to as identical by state, IBS). However, long IBS haplotypes could potentially 
also result from the concatenation of multiple shorter blocks of true IBD. While such runs can contain 



important information about deeper population history (e.g. Li and Durbin 2011), we view them as a false 
positives as they confound our relationships between number of shared IBD blocks and genetic ancestors. 
The chance of such a false positive IBD segment decreases as the genetic length of shared haplotype increases. 
However, the density of informative markers also plays a role, because in regions of low marker density we 
can miss alleles that differ. 

If we are to have a reasonable false positive rate, we must accept imperfect power. Power will also vary 
with the density and informativeness of markers and length of segment considered. For example, it is intuitive 
that segments of genome containing many rare alleles are easier identify as IBD. Conversely, rare immigrant 
segments from a population with different allele frequencies may have higher false positive rates. For these 
reasons, when estimating statistical power and false positive rate, it is important to use a dataset as similar 
to the one under consideration as possible. Therefore, to determine appropriate postprocessing criteria and 
to estimate our statistical power, we constructed a dataset similar to the POPRES with known shared 
IBD segments as follows: we copied segments randomly between 60 trio-phased individuals of European 
descent (using only one from each trio) from the HapMap dataset (haplotypes from release #21, 17/07/06 



International HapMap Consortium et al.[ 20071, substituted these for 60 individuals from Switzerland in 
the POPRES data, and ran BEAGLE on the result as before. We copied segments of single chromosomes 
between randomly chosen individuals, for random lengths 0.5-20cM, with gaps of at least 2cM between 
adjacent segments and without copying between the same two individuals twice in a row. When copying, we 
furthermore introduced genotyping error by flipping alleles independently with probability .002 and marking 
the allele missing with probability .023 (error rates were determined from duplicated individuals in the 
sample). An important feature of the inferred data was that BEAGLE often missed gaps in blocks longer 
than about 5cM, which led us to merge blocks as described above. 

We need a reasonably accurate assessment of our power, bias, and false positive rates for our inversion 
of the relationship between IBD block rate and number of genetic ancestors. Although the estimated IBD 
lengths were approximately unbiased, to adjust equation Q we also fit a parametric model to the relationship 
between true and inferred lengths after removing inferred blocks less than IcM long. A true IBD block of 
length X is missed entirely with probability 1 — c{x), and is otherwise inferred to have length x + e; with 
probability 7(0;) the error e is positive; otherwise it is negative and conditioned to be less than x. In either 
case, e is exponentially distributed; if e > its mean is l/A+(a;), while if e < its (unconditional) mean is 
1/A_(x). The parametric forms were chosen by examination of the data, and fit by maximum likelihood; 
these are: 

c{x) = 1 - 1/ (1 + m7x^ exp(.54a:)) (1) 
j{x) ^ .34 (1 - (1 + .51(x - 1)+ exp(.68(a; - 1)+))"^) (2) 
A+(a;) = 1.40 (3) 
A_(x) = min(.40 + l/(.18a;),12) (4) 

where = max(z, 0). 

To estimate the false positive rate, we randomly shuffled segments of diploid genome between individuals 
from the same population (only those 12 populations with at least 19 samples) so that any run of IBD 
longer than about 0.5cM would be broken up among many individuals. Specifically, as we read along the 
genome we output diploid genotypes in random order; we shuffled this order by exchanging the identity of 
each output individual with another at independent increments chosen uniformly between 0.1 and 0.2cM. 
This ensured that no output individual had a continuous run of length longer than 0.2cM copied from a 
single input individual, while also preserving linkage on scales shorter than O.lcM. The results are shown in 
figure [6t3; from these we estimate that the mean density of false positives x cM long per pair and per cM is 
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approximately 

/(a:) = exp(-13- 2x + 4.3^/x), (5) 
a parametric form again chosen by examination of the data. 

4.4 IBD rates along the genome 

To look for regions of unusual levels of IBD and to examine our assumption of uniformity, we compared the 
density of IBD tracts of different lengths along the genome, in supplemental figure |S1[ To do this, we first 
divided blocks up into nonoverlapping bins based on length, with cutpoints at 1, 2.5, 4, 6, 8, and lOcM. 
We then computed at each SNP the number of IBD blocks in each length bin that covered that site. To 
control for the effect of nearby SNP density on the ability to detect IBD, we then computed the residuals 
of a linear regression predicting number of overlapping IBD blocks using the density of SNPs within 3cM. 
To compare between bins, we then normalized these residuals, subtracting the mean and dividing by the 
standard deviation; these "z-scores" for each SNP are shown in figure [Sl] 



4.5 Correlations in IBD rates 



We noted repeated patterns of IBD sharing across multiple populations (seen in supplemental figure S3), in 
which certain sets of populations tended to show similar patterns of sharing. To quantify this, we computed 
correlations between mean numbers of IBD blocks of various lengths; in supplemental figure |S4| we show 
correlations in numbers blocks of various lengths. Specifically, if /(x, y) is mean the number of IBD blocks 
of the given length shared by an individual from population x with a (different) individual from population 



y, there are n populations, and I{x) — (l/(n — 1)) J^y^^x -^(^' v)' then figure S4 shows for each x and y 

1 



2 



J2 {I{x.z)-I{x)){I{y,z)-I{y)), (6) 



the (Pearson) correlation between I{x,z) and I{y,z) ranging across z ^ {x,y}. Other choices of block 
lengths are similar, although shorter blocks show higher overall correlations (due in part to false positives) 
and longer blocks show lower overall correlations (as rates are noisier, and sharing is more restricted to 
nearby populations). The groups were then chosen by visual inspection. 



4.6 Substructure 

We also assessed overall degrees of substructure within populations, i.e. the degree of inhomogeneity across 
individuals of population x for shared ancestry with population y (relative to that expected by chance). 
We measured inhomogeneity by the standard deviation in number of blocks shared with population y, 
across individuals of population x. We assessed the significance relative to a model of no substructure by a 
permutation test, randomly reassigning each block shared between x and y to a individual chosen uniformly 
from population x, and recomputing the standard deviation, 1000 times. The resulting p-values are shown 
in supplemental figure [S2] We did not analyze these in detail, particularly as we had limited power to detect 
substructure in populations with few samples, but note that a large proportion (47%) of the population pairs 
showed greater inhomogeneity than in all 1000 permuted samples (i.e. p < .001). Some comparisons even 
with many samples in both populations showed no structure whatsoever - in particular, the distribution of 
numbers of Italian IBD blocks shared by Swiss individuals is not distinguishable from Poisson, indicating a 
high degree of homogeneity of Italian ancestry across Switzerland. 



4.7 Inferring ages of common ancestors 

Here, our aim is to use the distribution of IBD block lengths to infer how long ago the genetic common 
ancestors were alive from which these IBD blocks were inherited. A pair of individuals who share a block 
of IBD of genetic length at least x have each inherited contiguous regions of genome from a single common 
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ancestor n generations ago, and these regions overlap for length at least x. The common ancestors of two 
individuals from which they might have inherited IBD blocks are those that can be connected to both by 
paths through the pedigree. Each link in such a path represents a meiosis (i.e. a reproduction) through 
which a certain amount of genetic material is passed, and the number of meioses is what determines the 
distribution of possible IBD blocks. 

Throughout the article we informally often refer to ancestors living a certain "number of generations in 



the past" as if humans were semelparous with a fixed lifetime (Fenner 2005 of roughly 30 years). Keeping 
with this, it is natural to write the number of IBD blocks shared by a pair of individuals as the sum over 
past generations of the number of IBD blocks inherited from that generation. More carefully, if N{x) is the 
number of IBD blocks of genetic length at least x shared by two individual chromosomes, and Nn{x) is the 
number of such IBD blocks inherited by the two along paths through the pedigree with a total of n meioses, 
then N{x) = J2n ^n{x). Therefore, averaging over possible choices of pairs of individuals, the mean number 
of shared IBD blocks can be similarly partitioned as 

E[7V(x)] = ^E[7V„(x)]. (7) 

n>l 

In each successive generation in the past each chromosome is broken up into successively more pieces, each 
of which has been inherited along a different path through the pedigree. Returning to the semelparous 
example, any two such pieces of the two individual chromosomes that overlap and are inherited from the 
same ancestral chromosome contribute one block of IBD. Therefore, the mean number of IBD blocks coming 
from t generations ago is the mean number of such possibly overlapping pieces multiplied by the probability 
that a particular pair of these pieces are first descended from the same genealogical ancestor in generation t. 
Allowing for uncoordinated generations, let K{n,x) denote the mean number of pieces of length at least x 
obtained by cutting the chromosome at the recombination sites of n meioses, and fi(n) the probability that 
the two chromosomes have inherited at a particular site along a path of total length n meioses (e.g. their 
common ancestor at that site lived n/2 generations ago). Then E[Nn{x)] is the product of these two terms, 
so that 

= E^W^("'^)' (8) 

n>l 

i.e. the mean rate of IBD is a linear function of the distribution of the time back to the most recent common 



ancestor. The distribution fj,{n) is more precisely known as the coalescent time distribution (Kingman 1982 



Wakeley, 20051, in its obvious adaptation to pedigrees. 

Furthermore, it is easy to calculate that for a chromosome of genetic length G, without interference (i.e. 
Poisson recombination) 

K{n, x) ^ {n{G - x) + l)cxp{~xn). (9) 

The mean number of IBD blocks of length at least x shared by a pair of individuals across the entire genome 
is then obtained by summing ([s]) across all chromosomes, and multiplying by four (for the four possible 
chromosome pairs). 

Equations ([S]) and (|9| give the relationship between lengths of shared IBD blocks and how long ago the 
ancestor lived from whom these blocks are inherited. Our goal is to invert this relationship to learn about 
fi{n), and hence the ages of the common ancestors underlying our observed distribution of IBD block lengths. 
To do this, we first need to account for sampling noise and estimation error. Suppose we are looking at IBD 
blocks shared between any of a set of Hp pairs of individuals, and assume that N{y), the number of observed 
IBD blocks shared between any of those pairs of length at least y, is Poisson distributed with mean npM{y), 
where 

^^(y) = /(y) + y] m(") / (/ c{x)Rix,z)dK{n,x)\ dz, with (10) 

„>i -^y yo J 

^ i-f{x)X+{x) exp{X+{x){y - x)) for y>x ^^^^ 

1 (1 — 7(a:))A_ exp(A_(a;)(.T — ?;))/(l — cxp(A_(a;)a;)) for y < a;. 



20 



Here the power c{x) and the components of the error kernel R{x, y) are estimated as above. The Poisson 
assumption is reasonable because there is a very small chance of having inherited a block from each pair of 
shared genealogical ancestors; there a great number of these, and if these events are sufhciently independent, 



the Poisson distribution will be a good approximation (see e.g. Grimmett and Stirzaker 2001 ). If this holds 



for each pair of individuals, the total number of IBD blocks is also Poisson distributed, with M given by the 
mean of this number across all constituent pairs. 



Unfortunately, the problem is ill-conditioned (the canonical example is the Laplace transform Epstein 
2008"), which in this context means that the likelihood surface is flat in certain directions 



and Schotland 



["ridged"): for each IBD block distribution N{x) there is a large set of coalescent time distributions fi{n) that 
fit the data equally well. A common problem in such problems is that the unconstrained maximum likelihood 
solution is wildly oscillatory; in our case, the unconstrained solution is not so obviously wrong, since we are 
helped considerably by the knowledge that > 0. For reviews of approaches to such ill-conditioned inverse 



problems, see e.g. Petrov and Sizikov (2005) or Stuart (2010); the problem is also known as "data unfolding' 



in particle physics (Cowan 1998). If one is concerned with finding a point estimate of /i, most approaches 



add an additional penalty to the likelihood, which is known as "regularization" (Tikhonov and Arsenin 



1977) or "ridge regression" (Hoerl and Kennard 1970). However, our goal is parametric inference, and so 



we must describe the limits of the "ridge" in the likelihood surface in various directions, (which can be seen 
as maximum a posteriori estimates under priors of various strengths). 

To do this, we first discretize the data, so that Ni is the number of IBD blocks with inferred ge- 
netic lengths falling between Xi-i and Xi, with a minimum length of 2cM long, so that xq = 2. We 



then compute by numerical integration the matrix L discretizing the kernel given in (10), so that Lin — 
XT' 1 /o c{x)R[x, z)dK{n, x)dz is the kernel that applied to fj, gives the mean number of true IBD blocks per 
pair observed with lengths between Xi-i and Xi, and fi = Jj^' ^ f{z)dz is the mean number of false positives 
per pair with lengths in the same interval. We then sum across chromosomes, as before. The likelihood of 
the data is thus 

1 



exp [ - ^ LinfJ-n ^ ) n iLin^J'n + fi)^'' 



(12) 



To the (negative) log likelihood we add a penalization 7, and use numerical optimization (optim in [R| 



Development Core Team 2012) to minimize the resulting functional (which omits terms independent of /z) 



7(m) 
rir, 



(13) 



Often we will fix the functional form of the penalization and vary its strength, so that 7(/i) — 7o2(/i), in 
which case we will write £(/i; 70, N) for £(/i; 70 z{p,), N). 

For instance, the leftmost panels in figure |4] show the minimizing solutions /i for 7 (no penalization) 
and for 7 ~ 7oX]n(Mn+i ~ Mn)^ ("roughness" penalization). Because our aim is to describe extremal 
reasonable estimates /i, in this and in other cases, we have chosen the strength of penalization 70 to be "as 
large as is reasonable" , choosing the largest 70 such that the minimizing /i has log likelihood differing by no 
more than 2 units from the unconstrained optimum. This choice of cutoff can be justified as in |Edwards| 
(1984), and gave quite similar answers to other methods. This can be thought of as taking the strongest 
prior that still gives us "reasonable" maximum a posteriori answers. Note that the optimization is over 
nonnegative distributions /i also satisfying /^(?^) < 1 (although the latter condition does not enter in 
practice) . 

We would also like to determine bounds on total numbers of shared genetic ancestors who lived during 
particular time intervals, by determining e.g. the minimum and maximum numbers of such ancestors that 
are consistent with the data. Such bounds are shown in figure [5] To obtain a lower for the time period 
between ni and n2 generations, we penalized the total amount of shared ancestry during this interval, using 
the penalizations 7- (/J.) = 7(7 (X^nLm ■> choosing to give a drop of 2 log likelihood units, as 

described above. The lower bound is then the total amount of coalescence minimizing 
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C{-]^-, N). The upper bound is found by penalizing total shared ancestry outside this interval, i.e. by 

applying the penalization 7+(/i) — 'Jq {J2n<ni + 12n>n2 /^("')) ■ almost always the case that lower 
bounds are zero, since there is sufficient wiggle room in the likelihood surface to explain the observed block 



length distribution using peaks just below ni and above 712. Examples are shown in supplemental figure S6 
On the other hand, upper bounds seem fairly reliable. 

In the above we have assumed that the minimizer of £ is unique, thus glossing over e.g. finding appropriate 
starting points for the optimization. In practice, we obtained good starting points by solving the natural 
approximating least-squares problem, using quadprog (Turlach and Weingessel 2011 1 in R. We then evaluated 
uniqueness of the minimizer by using different starting points, and found that if necessary, adding only a very 
small penalization term was enough to ensure convergence to a unique solution. We furthermore produced 
simulated data using a variety of distributions of common ancestors, confirming that the method behaves as 
expected (results not shown). 



Extending to shorter blocks We only used blocks longer than 2cM to infer ages of common ancestors, 
in part because the model we use does not seem to fit the data below this threshold. Attempts to apply the 
methods to all blocks longer than IcM reveals that there is no history of rates of common ancestry that, 
under this model, produces a block length distribution reasonably close to the one observed - small, but 
significant deviations occur in the rates of short blocks. This occurs probably in part because our estimate 
of false positive rate is expected to be less accurate at these short lengths. Furthermore, our model does 
not explicitly model the overlap of multiple short IBD segments to create on long segment deriving from 
different ancestors, which could start to have a significant effect at short lengths. (The effect on long blocks 
we model as error in length estimation.) This could be incorporated into a model (in a way analogous to 



Li and Durbin (2011)), but consideration of when several contiguous blocks of IBD might have few enough 



differences to be detected as a long IBD block quickly runs into the need for a model of IBD detection, 
which we necessarily treat as a black box. Use of these shorter blocks, which would allow inference of older 
ancestry, will need different methods, and probably sequencing rather than genotyping data. 



4.8 Numbers of common ancestors 

Estimated numbers of genetic common ancestors can be found by simply solving for A^(0) using an estimate 
of /i in equations Q and ^ (still restricting to genetic ancestors on the autosomes). These tell us that 
given the distribution /i(n), the mean number of genetic common ancestors coming from generation n - 
i.e. the mean number of IBD blocks of any length inherited from such common ancestors - is N{0) = 
/^("■) X^fcLi {fiT'Gk + 1), where Gk is the total genetic length of the fc"^ human chromosome. Since the total 
map length of the human autosomes is about 32 Morgans, this is about fj,{n){32n + 22). This procedure has 
been used in figures [4] and [5j 

Converting shared IBD blocks to numbers of shared genealogical common ancestors is more problematic. 
Suppose that modern-day individuals a and b both have c as a grand"" ^parent. Using equation ([9| at a; = 0, 
we know that the mean number of blocks that a and b both inherit from c is r(2n), with r{n) :— 2~"(32n-|-22), 
since each block has chance 2"^" of being inherited across 2n meioses. First treat the endpoints of each 
distinct path of length n back through the pedigree as a grand" "^parent, so that everyone has exactly 2" 
grand"~"'^parents, and some ancestors will be grand"" ^parents many times over. Then if a and b share m 
genetic grand""^parents, a moment estimator for the number of genealogical grand"^^parents is m/r(n). 
However, the geometric growth of r(n) means that small uncertainties in n have large effects on the estimated 
numbers of genealogical common ancestors - and we have large uncertainties in n. 

Despite these difficulties, we can still get some order-of-magnitude estimates. For instance, we estimate 
that someone from Hungary shares on average about 5 genetic common ancestors with someone from the 
UK between 18 and 50 generations ago. Since l/r'(36) — 5.8 x 10^, we would conservatively estimate that for 
every genetic common ancestor there are tens of millions of genealogical common ancestors. Most of these 
ancestors must be genealogical common ancestors many times over, but these must still represent at least 
thousands of distinct individuals. 
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Figure SI: Normalized density of IBD blocks of different lengths, corrected for SNP density, across all 



autosomes (see section 4.4 for details). Marked with a grey bar and "c" are the centromeres; and marked 
with "8p" is a large, segregating inversion (Giglio et al. 2001). The grey curve along the bottom shows 
normalized SNP density. 
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Figure S2: Two measures of overdispersal of block numbers across individuals (i.e. substructure): Suppose 
we have n individuals from population x, and Niy is the number of IBD blocks of length at least IcM that 
individual i shares with anyone from population y. Our statistic of substructure within x with respect to 

y is the variance of these numbers, Sxy — Nfy — ^ (^ - Niy)^^ . We obtained a "null" distribution 

for this statistic by randomly reassigning all blocks shared between x and y to an individual from x, and 
used this to evaluate the strength and the statistical significance of this substructure. (A) Histogram of the 
"p-value" , of the proportion of 1000 replicates that showed a variance greater than or equal to the observed 
variance s^y, for all pairs of populations x and y with at least 10 individuals in population y. (B) The "z 
score", which is observed value s^y minus mean value divided by standard deviation, estimated using 1000 
replicates. The population x is shown on the vertical axis, with text labels giving y, so for instance, Italians 
show much more substructure with most other populations than do Irish. Note that sample size still has a 
large effect - it is easier to see substructure with respect to the Swiss French (a; =CHf) because the large 
number of Swiss French samples allows greater resolution. A vertical line is shown at 2; = 5. Only pairs of 
populations with at least 3 samples in country x and 10 samples in country y are shown. Because of the log 
scale, only pairs with a positive z score are shown, but no comparisons had z < —2.5, and only three had 
z < -2. 
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Figure S3: (A) Mean numbers of IBD blocks of length at least IcM per pair of individuals, shown as a 
modified Cleveland dotchart, with ±2 standard deviations shown as horizontal lines. For instance, on the 
bottom row we see that someone from the UK shares on average about one IBD block with someone else from 
the UK and slightly less than 0.2 blocks with someone from Turkey. Note that in most cases, the distribution 
of block numbers is fairly concentrated, and that nearby populations show quite similar patterns. 



32 



tii:: 



»- 1 ■ 



* IIII U IIIII Mtt Et lU t 



*MIII|I U 1<I 



It:: 
■mttt 



i I n i i n I l ii 



I I III I nil 1 1 nil I nil 



• I- 1- l- + -r + + + 



+ r + + + + + 



444H 1^ 



II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MiiH>ii<nfriiiiiiKi<iri I 



1 



oi)fnblJafis,t<5] M 



* -•■■l-t-l4-ll|'^4i'r-l 1 



I M l I I I I I I I 1 1 I I I I I I I M I I I I I I I I I I I I I I M il 



4 I U I U III|ll h «» M iH' 



-.::t 



II 111 I III I 



;\:tt; 

J,.,-- - ■ 

E if. ■ ■ I ami-m l^a . i . 
F ■ ■ I F + + ■ 1 

■ f|«T + 4i+« ■ ++ ■ - 

a ' i rrr: ri i I ! I : i 



Figure S4: Correlations in IBD rates, for six different length windows (omitted length windows are similar). 
If there are n populations, I(x,y) is the mean number of blocks in the given length range shared by a pair 
from populations x and y, and I{x) = — 1)) X^z^a: -^(^' shown is (l/(n — '^))J2z^{x,y} 

{I{x,z)- 

I{x)){I{y,z)-I{y)). 
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Figure S5: Estimated total numbers of genetic common ancestors shared by various pairs of populations, 
in roughly the time periods 0-500ya, 500-1500ya, 1500-2500ya, and 2500-4300ya. The two sets of figures 
are identical, except with different limits on the vertical axis - this is done because of the large differences in 
scale between different populations. The population groupings are: "AL", Albanian speakers (Albania and 
Kosovo); "S-C", Serbo-Croatian speakers in Bosnia, Croatia, Serbia, Montenegro, and Yugoslavia; "R-B", 
Romania and Bulgaria; "UK", United Kingdom, England, Scotland, Wales; "Iber", Spain and Portugal; 
"Bel", Belgium and the Netherlands; "Bal", Latvia, •inland, Sweden, Norway, and Denmark; and denotes 
a single population with the same abbreviations as in table [T] otherwise. 
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Figure S6: An example of the set of consistent histories (numbers of genetic common ancestors back 
through time) used to find upper and lower bounds in figures S5 and [sj The example shown is Poland- 
Germany; "pointy" is the maximum likelihood history; "smooth" is the smoothest consistent history; and 
the remaining plots show the histories giving lower and upper bounds for the referenced time intervals (in 
numbers of generations). In each case, the segment o^jjime on which we are looking for a bound is shaded. 
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Figure S7: The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of 
population groupings of figure S5 (including those of figure [5]). Each panel is analogous to a panel of figure |4] 
time scale is given by vertical grey lines every 500 years. For these plots on a larger scale see supplemental 
figure [S8 
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COUNTRY_SELF 


COUNTRY_GFOLX 


PRIMARY_LANGUAGE 


Population 


n 


Albania 


Albania 


Albanian 


Albania 


3 


Yugoslavia 


Serbia 


Albanian 


Albania 


1 


Yugoslavia 


Yugoslavia 


Albanian 


Albania 


5 


Austria 




German 


Austria 


3 


Austria 


Austria 


German 


Austria 


10 


Spain 


Austria 


German 


Austria 


1 


Belgium 


Belgium 


Dutch 


Belgium 


4 


Belgium 


Belgium 


Flemish 


Belgium 


3 


Belgium 


Belgium 


French 


Belgium 


28 


Germany 


Belgium 


French 


Belgium 


1 


Switzerland 


Belgium 


French 


Belgium 


1 


Bosnia 


Bosnia 


Bosnian 


Bosnia 


4 


Bosnia 


Bosnia 


Serbian 


Bosnia 


1 


Bosnia 


Bosnia 


Serbo-Croatian 


Bosnia 


4 


Bulgaria 


Bulgaria 


Bulgarian 


Bulgaria 


1 


Croatia 




Croatian 


Croatia 


1 


Croatia 


Croatia 


Croatian 


Croatia 


6 


Yugoslavia 


Yugoslavia 


Croatian 


Croatia 


1 


Croatia 


Croatia 


Serbo-Croatian 


Croatia 


1 


Cyprus 




English 


Cyprus 


1 


Cyprus 




Greek 


Cyprus 


1 


Cyprus 


Cyprus 


Greek 


Cyprus 


1 


Czech Republic 


Czech Republic 


Czech 


Czech Republic 


9 


Denmark 




Danish 


Denmark 


1 


England 


England 


English 


England 


18 


Turkey 


England 


English 


England 


1 


United Kingdom 


England 


English 


England 


3 


Finland 


Finland 


Finnish 


Finland 


1 


France 




French 


France 


2 


France 


France 


French 


France 


82 


Germany 


France 


French 


Prance 


1 


Switzerland 


France 


French 


France 


1 


Germany 






Germany 


1 


Germany 




English 


Germany 


2 


Germany 


Germany 


French 


Germany 


1 


Germany 




German 


Germany 


1 


Germany 


Germany 


German 


Germany 


63 


Switzerland 


Germany 


German 


Germany 


1 


Hungary 


Germany 


Hungarian 


Germany 


1 


Germany 


Germany 


PoUsh 


Germany 


1 


Switzerland 


Greece 


French 


Greece 


1 


Greece 


Greece 


Greek 


Greece 


4 


(coutiinuHl ou iK'xl page) 



Table SI: The composition of our populations. "COUNTRY_SELF" is the reported country of origin; 
"COUNTRY.GFOLX" is the country of origin of all reported grandparents (individuals with reported grand- 
parents from different countries were removed); "PRIMARY JLANGU AGE" is the reported primary language; 
"Population" is our population label; and n gives the number of individuals falling in this category. 
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COUNTRY.SELF 


COUNTRY.GFOLX 


PRIMARY_LANGUAGE 


Population 


n 


Hungary 


Hungary 


French 


Hungary 


1 


Hungary 


Hungary 


Hungarian 


Hungary 


17 


Hungary 


Hungary 


Russian 


Hungary 


1 


Ireland 






Ireland 


19 


Ireland 




English 


Ireland 


38 


England 


Ireland 


English 


Ireland 


1 


Ireland 


Ireland 


English 


Ireland 


1 


Ireland 


Ireland 


French 


Ireland 


1 


Italy 






Italy 


1 


France 


Italy 


French 


Italy 


1 


Italy 


Italy 


French 


Italy 


8 


Switzerland 


Italy 


French 


Italy 


9 


Italy 


Italy 


German 


Italy 


1 


Italy 




Italian 


Italy 


3 


France 


Italy 


Italian 


Italy 


1 


Italy 


Italy 


Italian 


Italy 


170 


Romania 


Italy 


Italian 


Italy 


1 


Sweden 


Italy 


Italian 


Italy 


1 


Switzerland 


Italy 


Italian 


Italy 


17 


Kosovo 






Kosovo 


1 


Yugoslavia 


Kosovo 


Albanian 


Kosovo 


10 


Yugoslavia 


Kosovo 


Kosovan 


Kosovo 


2 


Yugoslavia 


Kosovo 


Serbo-Croatian 


Kosovo 


2 


Latvia 


Latvia 


Latvian 


Latvia 


1 


Macedonia 


Macedonia 


Macedonian 


Macedonia 


4 


Yugoslavia 


Montenegro 


Serbian 


Montenegro 


1 


Netherlands 


Netherlands 


Dutch 


Netherlands 


15 


Holland 




English 


Netherlands 


1 


Netherlands 


Netherlands 


French 


Netherlands 


1 


Norway 


Norway 


Norwegian 


Norway 


2 


France 


Poland 


French 


Poland 


1 


Poland 




Polish 


Poland 


4 


France 


Poland 


Polish 


Poland 


2 


Poland 


Poland 


Polish 


Poland 


15 


France 


Portugal 


Portuguese 


Portugal 


1 


Portugal 


Portugal 


Portuguese 


Portugal 


114 


Romania 


Romania 


Romanian 


Romania 


14 


Romania 


Russia 


Romanian 


Russia 


1 


Russia 


Russia 


Russian 


Russia 


5 


Scotland 




English 


Scotland 


3 


Scotland 


Scotland 


English 


Scotland 


2 


Yugoslavia 


Serbia 


Hungarian 


Serbia 


1 


Serbia 


Serbia 


Serbian 


Serbia 


1 


Yugoslavia 


Serbia 


Serbian 


Serbia 


4 


Yugoslavia 


Yugoslavia 


Serbian 


Serbia 


2 


Croatia 


Serbia 


Serbo-Croatian 


Serbia 


1 


Yugoslavia 


Serbia 


Serbo-Croatian 


Serbia 


2 


(continued on next page) 



Table S2: Continuation of table EH 
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COUJN iKY_S£jL_b 


COUN IRY_G1'(JLX 


PRIMARY _L AN G U AGh 


Population 


n 


Slovakia 


Slovakia 


Slovakian 


Slovakia 


1 


Italy 


Slovenia 


Slovene 


Slovenia 


1 


Slovenia 


Slovenia 


Slovene 


Slovenia 


1 


Spain 


Spain 


Columbia 


Spain 


2 


Switzerland 


Spain 


Columbia 


Spain 


2 


Spain 


Spain 


French 


Spain 


5 


Switzerland 


Spain 


French 


Spain 


2 


Spain 


Spain 


Galician 


Spain 


2 


Spain 




Spanish 


Spain 


4 


Spain 


Spain 


Spanish 


Spain 


106 


Switzerland 


Spain 


Spanish 


Spain 


7 


Sweden 






Sweden 


1 


Sweden 


Sweden 


Swedish 


Sweden 


9 


Switzerland 




French 


Swiss French 


1 


Belgium 


Switzerland 


French 


Swiss French 


1 


Czech Republic 


Switzerland 


French 


Swiss French 


1 


France 


Switzerland 


I'rench 


bwiss I'rench 


7 


Poland 


Switzerland 


French 


Swiss French 


1 


Portugal 


Switzerland 


French 


Swiss French 


1 


Spain 


Switzerland 


French 


Swiss French 


1 


Switzerland 


Switzerland 


French 


Swiss French 


826 


Switzerland 


Switzerland 


German 


Swiss German 


103 


Italy 


Switzerland 


Italian 


Switzerland 


2 


Switzerland 


Switzerland 


Italian 


Switzerland 


12 


Switzerland 


Switzerland 


Patois 


Switzerland 


1 


Switzerland 


Switzerland 


Romansch 


Switzerland 


1 
i 


Spain 


Switzerland 


Spanish 


Switzerland 


1 


Turkey 


Turkey 


Turkish 


Turkey 


4 


Ukraine 


Ukraine 


Ukranian 


Ukraine 


1 


United Kingdom 






United Kingdom 


87 


United Kingdom 




English 


United Kingdom 


270 


United Kingdom 


United Kingdom 


English 


United Kingdom 


1 


Yugoslavia 






Yugoslavia 


1 


Yugoslavia 


Yugoslavia 


French 


Yugoslavia 


1 


Yugoslavia 


Yugoslavia 


Romanian 


Yugoslavia 


1 


Yugoslavia 


Yugoslavia 


Serbo-Croatian 


Yugoslavia 


3 


Yugoslavia 


Yugoslavia 


Yugoslavian 


Yugoslavia 


4 



Table S3: Continuation of table IS2I 
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http : / / www . eve . ucdavis . edu/ ~plralph/ibd/boxplotted-inversions . pdf 



Figure S8: All inversions shown in S7 one per page (225 pages total). There is one page per pair of 
comparisons used in figure [5j On each page, there is one large plot, showing 10 distinct consistent histories 
(numbers of genetic ancestors back through time), and below are 10 histograms of IBD block length, one 
for each consistent history, showing both the observed distribution and the partitioning of blocks into age 
categories predicted by that history. The names of the two groupings are shown in the upper right: "pointy" 
is the unconstrained maximum likelihood solution; "smooth" is the smoothest consistent history; "a-& lower" 
is the history used to find the lower bound for the time period a-b generations ago in figure [5] and "a-6 
upper" is the history used to find the corresponding upper bound. Each of these are described in more detail 
in the Methods. 
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